[SOUND] This lecture is about
Collaborative Filtering.
In this lecture, we're going to continue
the discussion of Recommender Systems.
In particular, we're going to look at
the approach of collaborative filtering.
You have seen this slide before
when we talked about the two
strategies to answer the basic
question will user U like item X.
In the previous lecture,
we looked at the item similarity,
that's content-based filtering.
In this lecture, we're going to
look at the user similarity.
This is a different strategy
called collaborative filtering.
So first of all,
what is collaborative filtering?
It is to make filtering decisions for
individual user based on
the judgement of other users and
that is to say,
we will infer individual's interest or
preferences from that,
of other similar users.
So the general idea is the following.
Given a user u, we are going to
first find the similar users,
u1 through and then we're going to
predict the used preferences based on
the preferences of these similar users,
u1 through.
Now the users similarity here can be
judged based on their similarity.
The preference is on
a common set of items.
Now here you'll see that the exact
content of item doesn't really matter.
We're going to look at the only,
the relationship between the users and
the items.
So this means this approach
is very general if it can be
applied to any items not
just with text objects.
So this approach, it would work well
under the following assumptions.
First users with the same interests
will have similar preferences.
Second, the users with similar preferences
probably share the same interests.
So for example, if the interest of
the user is in information retrieval,
then we can infer the user
probably favor SIGIR papers.
And so those who are interested in
information retrieval researches probably
all favor SIGIR papers,
that's something that we make.
And if this assumption is true,
then it would help collaborative
filtering to work well.
We can also assume that if we
see people favor SIGIR papers,
then we can infer the interest is
probably information retrieval.
So these simple examples,
it seems what makes sense.
And in many cases such as assumption
actually does make sense.
So, another assumption you have to make
is that there are a sufficiently large
number of user preferences
available to us.
So for example, if you see a lot
of ratings of users for movies and
those indicate their
preferences in movies.
And if you have a lot of such data,
then collaborative filtering
can be very effective.
If not, there will be a problem and
that's often called a cold start problem.
That means you don't have many
preferences available, so
the system could not fully take advantage
of collaborative filtering yet.
So let's look at the collaborative
filtering problem in a more formal way.
And so this picture shows that we are in
general considering a lot of users and
showing we're showing m users here.
So, u1 through and we're also
considering a number of objects.
Let's say,
n objects denoted as o1 through on and
then we will assume that the users will
be able to judge those objects and
the user could for example,
give ratings to those items.
For example, those items could be movies,
could be products and
then the users would give ratings
one through five, let's say.
So what you see here is that we have
assumed some ratings available for
some combinations.
So some users have watched movies,
they have rated those movies.
They obviously won't be able
to watch all the movies and
some users may actually
only watch a few movies.
So this is in general a response matrix,
right?
So many item many entries
have unknown values and
what's interesting here is
we could potentially infer
the value of a element in this
matrix based on other values and
that's actually the central question
in collaborative filtering.
And that is,
we assume an unknown function here f,
that would map a pair of user and
object to a rating.
And we have observed there are some
values of this function and
we want to infer the value
of this function for
other pairs that we,
that don't have values available here.
So this is ve, very similar to
other machine learning problems,
where we would know the values of the
function on some training there that and
we hope to predict the the values of
this function on some test there.
All right.
So this is the function approximation.
And how can we pick out the function
based on the observed ratings?
So this is the, the setup.
Now there are many approaches
to solving this problem.
And in fact,
this is a very active research area.
A reason that there are special
conferences dedicated to the problem
is a major conference
devoted to the problem.
[MUSIC]

